Measuring urban morphology - a guideΒΆ

Road morphology

Road morphology measures are beginning to offer insight into how particular types of road layout affect bikeshare use. Recent research suggests that road length between docking stations and road straightness affect the distance and amount of trips generated. The density of street junctions is one measure indicative of road connectivity. Variation in road connectivity can impact the flow of traffic, determining the efficiency and safety of a journey, influencing route and mode choice. Hence, measuring a road network's connectivity, shape and pattern offers novel insight regarding how people navigate urban space by bicycle.

Building morphology

In addition to the function of buildings, are measures of their physical form that describe the visual component of urban space, including building width, density, height, squareness, circularity and elongation. Together, these measures can define gridded city blocks (consistent widths, high-density), historical centres (narrow buildings, varied heights, high-density), business districts (taller buildings, high-density), retail centres (wider buildings), or different types of residential neighbourhoods (density and height variation accompanying housing types). Building elongation can influence the perception of street enclosure, which Ewing and Handy (2009) identified as an important factor in walkability. More elongated buildings parallel to the street can create a stronger sense of enclosure, potentially encouraging walking. Regarding bikeshare use, however, there is little known regarding how it varies according to building morphology.

How?

To measure the morphological characteristics of roads, it is helpful to measure them in relation to their network. The "network", referring to the routes within a system of locations, denoted as nodes. A route is a single link between two nodes, part of a larger network. In mathematical terms, this network is considered as a graph of nodes and edges. The efficiency of transport flows between nodes in a graph depends partly on its topology, which is to say, its layout of nodes and edges. Formally, a graph G is a set of vertices (nodes) v, connected by edges (links) e, leading to G = (v,e). Certain structures will be more efficient (i.e. offering direct routes) than others and some will be more resiliant to disruption (i.e. traffic) by maintaining network connetivity through route alternatives. As graphs become more complex, it is not obvious when comparing cities visually, which is more efficient or accessible. Graph theory offers several measures and indices, used to assess the efficiency and accessibility of a transport network. In this research, graph measures are used to: calculate how connected the roads in proximity to a bikeshare dockings station are to the rest of the network, how easily other locations can be reached within a certain threshold, the ease in which alternative routes can be taken from a station and the continuity of roads egressing a station.

The spatial footprints of the buildings are represented as polygons, measuring their dimensions (area, length) and shape (compactness, number of corners, rectangularity, straightness).

From these various measures of road and building morphology, we can gain a deeper representation of the underlying structure of urban space. Both the graphs and the polygons can be converted to tablular form, which we can treat the same way we do any other tabular data. This means we can use them as inputs to a statistical model, or as variables for generating clusters. For cluster analysis, it is necessary to define a spatial unit serving as observations for the input data. Ideally, the unit should be granular and represent the underlying morphology of the city. To achieve this, the process of morphological tessellation is performed.

Morphological tessellation

Morphological tessellation is a process used to partition cities into spatial units that capture the underlying structure of the urban landscape by combining roads and buildings. The process involves three primary steps. First, building polygons are spatially joined with their closest road segment, creating initial spatial units that represent the streetscape observable by pedestrians. Second, a distance band of 100m is applied around these combined units to approximate limits within visible range. This distance is based on approaches by Araldi and Fusco (2019) and Fleischmann and Arribas-Bel (2022). Third, the resulting areas are subdivided (or 'tessellated') into Voronoi polygons, based on the centroids of the buildings. This ensures the final spatial units are indivisible and exhaustive across space. This method creates contiguous spatial units that capture the shape and spatial extent of both roads and buildings, providing a meaningful representation of urban morphology. A process of spatial smoothing is performed to minimise the impact of outliers and better represent the context of the area (Alexiou et al., 2016). A distance band is calculated for each unit to assign the neighbouring units falling within its 500m radius. The spatial lag is obtained by taking the average of each unit's neighbour values and calculating the interquartile range of these values (Fleischmann and Arribas-Bel, 2022).

Why?ΒΆ

In this article I will run through some of the steps I have taken to measure and store this data in a format which is suitable for modelling. It is intended as an example that is applicable to any city. The data focusses on one city, but was done for several in my wider project.

Initialising the libraries, data and functionsΒΆ

The workhorses for performing the analysis are networkx and momepy for calculating graph statistics and geopandas for manipulating geo dataframes.

I have already downloaded road network and buildings data from the Ordinance Survey website.

InΒ [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import contextily as cx
import matplotlib.pyplot as plt
from shapely.geometry import Point, MultiPoint, MultiPolygon, Polygon, LineString, MultiLineString
import networkx as nx
import momepy as mp
import tqdm
import libpysal
InΒ [Β ]:
# OS ROADS # 
OS_rds =  gpd.read_file("C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/ordinance_survey_roads_BBs.gpkg")
OS_rds = OS_rds[['city', 'geometry']]
OS_rds = gpd.GeoDataFrame(OS_rds)
OS_rds = OS_rds.explode()

lrds = OS_rds.loc[OS_rds['city']=='Leicester']

lrds  = mp.remove_false_nodes(lrds)
lrds.geometry = mp.close_gaps(lrds, 2)
lrds['city'] = 'Leicester'
lrds = lrds[['geometry']]
lrds['nID'] = mp.unique_id(lrds)
InΒ [30]:
# # # CITY BOUNDARIES #
boundaries = gpd.read_file("C:/Users/patri/Documents/CDRC Bikeshare Station Data/spatial_data_pre_process/combined_boundaries_updated_poole_christchurch_clipped.gpkg")
boundaries = boundaries.rename(columns={'city':'boundary_city'})
boundaries['boundary_city'] = boundaries['boundary_city'].replace({'Glasgow City': 'Glasgow', 'City of Leicester': 'Leicester'})
boundaries = boundaries.loc[boundaries['boundary_city']=='Leicester']
Out[30]:
boundary_city geometry
4 Leicester MULTIPOLYGON (((464453.396 306785.703, 464443....
InΒ [Β ]:
# BUILDINGS  # 
lbuildings = gpd.read_file('C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/leicester_buildings_OS.gpkg')
InΒ [Β ]:
# BUILT UP AREAS # 
built_up_areas = gpd.read_file("C:/Users/patri/Downloads/OS_Open_Built_Up_Areas_GeoPackage/OS_Open_Built_Up_Areas.gpkg")
built_up_areas.to_crs(27700, inplace=True)
built_up_areas = built_up_areas[['geometry']]
InΒ [Β ]:
def building_id(buildings):
    buildings = buildings.to_crs(27700)
    buildings['uID'] = mp.unique_id(buildings)
    buildings = buildings[['uID', 'geometry']]

    return buildings

lbuildings = building_id(lbuildings)
InΒ [Β ]:
# TESSELATION FUNCTION # 

def tesselate_func(buildings, streets_gdf, letter):

    limit = mp.buffered_limit(buildings, 100)

    tessellation = mp.Tessellation(buildings, "uID", limit, verbose=False, segment=1)
    tessellation = tessellation.tessellation
    print("tesseltation DONE")
    tessellation = tessellation.overlay(built_up_areas, how='intersection')
    
    print("tesseltation INTERSECTION DONE")

    #link streets
    buildings = buildings[['uID', 'geometry']]
    buildings = buildings.sjoin_nearest(streets_gdf, max_distance=1000)
    buildings = buildings.drop_duplicates("uID").drop(columns="index_right")
    tessellation = tessellation.merge(buildings[['uID', 'nID']], on='uID', how='left')
    tessellation = tessellation.drop_duplicates("uID")
    
    # Dimensions
    buildings["area"] = buildings.area
    tessellation["area"] = tessellation.area
    streets_gdf["length"] = streets_gdf.length
    # Shape
    buildings['eri'] = mp.EquivalentRectangularIndex(buildings).series
    print('EquivalentRectangularIndex DONE')
    buildings['elongation'] = mp.Elongation(buildings).series
    print('elongation DONE')
    tessellation['convexity'] = mp.Convexity(tessellation).series
    print('convexity DONE')
    streets_gdf["linearity"] = mp.Linearity(streets_gdf).series
    print('linearity DONE')

    # Spatial distribution
    buildings["shared_walls"] = mp.SharedWallsRatio(buildings).series
    print('shared_walls DONE')

    # Generate spatial weights matrix using libpysal.

    queen_1 = libpysal.weights.contiguity.Queen.from_dataframe(tessellation, ids="uID", silence_warnings=True)
    tessellation["neighbors"] = mp.Neighbors(tessellation, queen_1, "uID", weighted=True, verbose=False).series
    tessellation["covered_area"] = mp.CoveredArea(tessellation, queen_1, "uID", verbose=False).series

    buildings["neighbor_distance"] = mp.NeighborDistance(buildings, queen_1, "uID", verbose=False).series

    queen_3 = mp.sw_high(k=3, weights=queen_1)
    buildings_q1 = libpysal.weights.contiguity.Queen.from_dataframe(buildings, silence_warnings=True)

    # buildings['interbuilding_distance'] = mp.MeanInterbuildingDistance(buildings, queen_1, 'uID', queen_3, verbose=False).series
    buildings['adjacency'] = mp.BuildingAdjacency(buildings, queen_3, 'uID', buildings_q1, verbose=False).series
    print('adjacency DONE') 
    buildings['squarenes'] = mp.Squareness(buildings).series
    print('squarenes DONE') 
    buildings['squarecompactness'] = mp.SquareCompactness(buildings).series
    print('squarecompactness DONE') 
    buildings['corners'] = mp.Corners(buildings).series
    print('corners DONE') 

    profile = mp.StreetProfile(streets_gdf, buildings)
    streets_gdf["width"] = profile.w
    print('width DONE') 
    streets_gdf["width_deviation"] = profile.wd
    print('width_deviation DONE') 
    streets_gdf["openness"] = profile.o
    print('openness DONE')

    # Intensity
    tessellation['area_ratio'] = mp.AreaRatio(tessellation, buildings, 'area', 'area', 'uID').series
    print('area_ratio DONE')

    # Connectivity
    graph = mp.gdf_to_nx(streets_gdf)

    graph = mp.node_degree(graph)
    print('node degree DONE')
    graph = mp.closeness_centrality(graph, radius=2000, distance="mm_len")
    print('closeness DONE')
    graph = mp.meshedness(graph, radius=2000, distance="mm_len")
    print('meshedness DONE')
    graph = mp.gamma(graph, radius=2000, name='gamma', distance="mm_len", verbose=True)
    print('gamma DONE')
    graph = mp.edge_node_ratio(graph, radius=500, name='edge_node_ratio', distance="mm_len", verbose=True)
    print('edge node ratio DONE')
    nodes, streets = mp.nx_to_gdf(graph)

    streets['street_orientation'] = mp.NeighboringStreetOrientationDeviation(streets).series
    print('street_orientation ratio DONE')

    buildings["nodeID"] = mp.get_node_id(buildings, nodes, streets, "nodeID", "nID")
    # Link all data together (to tessellation cells or buildings).

    # tessellation
    streets = streets[['nID', 'street_orientation','geometry']]
    merged = tessellation.merge(buildings.drop(columns=['nID', 'geometry']), on='uID')
    merged = merged.merge(streets.drop(columns='geometry'), on='nID', how='left')
    merged = merged.merge(nodes.drop(columns='geometry'), on='nodeID', how='left')

    return buildings, streets, nodes, tessellation, merged
InΒ [Β ]:
lbuildings, lstreets, lnodes, ltessellation, lmerged = tesselate_func(lbuildings, lrds, "L")

Visualising the dataΒΆ

InΒ [2]:
merged = gpd.read_file('C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/tessellation/L_merged_tessellation_100m.gpkg', driver='GPKG')
buildings = gpd.read_file('C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/tessellation/L_buildings_tesselate_stats_100m.gpkg', driver='GPKG')
streets = gpd.read_file('C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/tessellation/L_streets_gdf_tesselate_stats_100m.gpkg', driver='GPKG')
tessellation = gpd.read_file('C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/tessellation/L_tesselation_100m.gpkg', driver='GPKG')
nodes = gpd.read_file('C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/tessellation/L_nodes_tesselation_100m.gpkg', driver='GPKG')
InΒ [4]:
nodes.columns
Out[4]:
Index(['degree', 'cds_len', 'closeness2000', 'meshedness', 'gamma2000',
       'edge_node_ratio', 'nodeID', 'closeness_global', 'geometry'],
      dtype='object')
InΒ [6]:
nodes
Out[6]:
degree cds_len closeness2000 meshedness gamma2000 edge_node_ratio nodeID closeness_global geometry
0 3 15129.056047 0.000035 0.111384 0.408074 1.000000 0 0.000148 POINT (457073.270 300462.100)
1 3 15733.159494 0.000035 0.110479 0.407468 1.020000 1 0.000147 POINT (457070.000 300432.000)
2 3 24310.011935 0.000027 0.108954 0.406609 1.071429 2 0.000151 POINT (463239.000 303470.000)
3 3 22144.062804 0.000026 0.111880 0.408602 1.117647 3 0.000146 POINT (463314.000 303417.000)
4 4 17293.273810 0.000039 0.146743 0.431569 1.291667 4 0.000147 POINT (458455.000 300635.000)
... ... ... ... ... ... ... ... ... ...
13406 1 11783.593790 0.000016 0.101806 0.402186 0.800000 13406 0.000183 POINT (464448.000 305431.000)
13407 3 9593.188961 0.000021 0.090656 0.394615 1.074074 13407 0.000001 POINT (464808.000 304694.000)
13408 1 4625.099820 0.000002 0.044776 0.372549 0.875000 13408 0.000001 POINT (464906.280 298069.010)
13409 1 7022.881817 0.000016 0.111298 0.408530 0.923077 13409 0.000000 POINT (464927.010 305723.750)
13410 1 6282.603608 0.000007 0.061224 0.376694 0.933333 13410 0.000000 POINT (464960.110 303724.480)

13411 rows Γ— 9 columns

Checking the road networkΒΆ

First, let's check the node measures of the road network, in the form of points.

InΒ [20]:
fig, axs = plt.subplots(2, 4, figsize=(32, 16))

# Define a function to style each subplot
def style_subplot(ax, title):
    ax.set_title(title, fontsize=16)
    ax.axis('off')  # This removes all axis lines, ticks, and labels

# Plot each subplot with reduced point size and custom styling
variables = [
    "closeness2000", "closeness_global", "meshedness", "degree",
    "cds_len", "edge_node_ratio", "gamma2000"
]

for i, var in enumerate(variables):
    row = i // 4
    col = i % 4
    if var:  # Only plot if there's a variable name
        nodes.plot(var, ax=axs[row, col], scheme='natural_breaks', legend=True, markersize=5)
        style_subplot(axs[row, col], var.replace("_", " ").title())
    else:
        axs[row, col].axis('off')  # Turn off empty subplots

# Adjust layout to prevent overlap
plt.tight_layout()

# Show the plot
plt.show()
No description has been provided for this image

The closeness centrality measured within a 2000m radius of each node has captured the conentric structure of the city, while the global measurement has no obvious pattern. This suggests the global measure may have too much variation to be meaningful. Meshedness and Gamma are extremely similar, indicating some redundancy if including them both in the model. The effect of node degree is quite subtle and isolated to small areas, while edge to node ratio has a higher concentration of high values in and around the urban core. Cul de sac length is intuitively higher in peripheral areas, indicative of suburban residential neighbourhoods.

Checking the road network measures from road segmentsΒΆ

InΒ [31]:
streets = gpd.clip(streets, boundaries)
InΒ [32]:
fig, axs = plt.subplots(2, 3, figsize=(24, 16))

# Define a function to style each subplot
def style_subplot(ax, title):
    ax.set_title(title, fontsize=16)
    ax.axis('off')  # This removes all axis lines, ticks, and labels

# Plot each subplot with reduced point size and custom styling
variables = [
    "linearity", "length", "width", "width_deviation",
    "continuity", "openness"
]

for i, var in enumerate(variables):
    row = i // 3
    col = i % 3
    if var:  # Only plot if there's a variable name
        streets.plot(var, ax=axs[row, col], scheme='natural_breaks', legend=True, markersize=5)
        style_subplot(axs[row, col], var.replace("_", " ").title())
    else:
        axs[row, col].axis('off')  # Turn off empty subplots

# Adjust layout to prevent overlap
plt.tight_layout()

# Show the plot
plt.show()
No description has been provided for this image

Checking the building footprintsΒΆ

InΒ [21]:
buildings.columns
Out[21]:
Index(['uID', 'nID', 'area', 'eri', 'elongation', 'neighbor_distance',
       'shared_walls', 'swr', 'interbuilding_distance', 'adjacency',
       'squarenes', 'squarecompactness', 'corners', 'comp',
       'building_orientation', 'cell_alignment', 'nodeID', 'geometry'],
      dtype='object')
InΒ [5]:
fig, axs = plt.subplots(2, 3, figsize=(24, 16))

# Define a function to style each subplot
def style_subplot(ax, title):
    ax.set_title(title, fontsize=16)
    ax.axis('off')  # This removes all axis lines, ticks, and labels

# Plot each subplot with reduced point size and custom styling
variables = [
    "eri", "elongation", "neighbor_distance",  'building_orientation',
    "shared_walls", "squarecompactness"
]

for i, var in enumerate(variables):
    row = i // 3
    col = i % 3
    if var:  # Only plot if there's a variable name
        buildings.plot(var, ax=axs[row, col], scheme='natural_breaks', legend=True)
        style_subplot(axs[row, col], var.replace("_", " ").title())
    else:
        axs[row, col].axis('off')  # Turn off empty subplots

# Adjust layout to prevent overlap
plt.tight_layout()

# Show the plot
plt.show()
No description has been provided for this image

Checking the tessellated unitsΒΆ

First, we can create an example of a tessellated unit, zoomed in to see how each step works to create the final unit. We buffer the buildings by 100 metres to set the limit of the tessellation.

InΒ [35]:
from shapely.geometry import box
# BBS
city_bbs = gpd.read_file("C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/bounding_boxes_stations_boundaries.gpkg")
city_bbs.to_crs(4326, inplace=True)
lbb = city_bbs.loc[city_bbs['city']=='Leicester']
lbb.to_crs(27700, inplace=True)
lbb['centroid'] = lbb.geometry.centroid
lbb.set_geometry('centroid', inplace=True)
c:\Users\patri\Documents\spopt_venv\lib\site-packages\geopandas\geodataframe.py:1528: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
c:\Users\patri\Documents\spopt_venv\lib\site-packages\geopandas\geodataframe.py:1528: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
InΒ [38]:
limit = mp.buffered_limit(buildings, 100)
l_tess = mp.Tessellation(buildings, 'uID', limit, segment=1) # create tesselations from all buildings
l_tess = l_tess.tessellation

# Create the bounding box
def create_bbox(point, distance):
    minx, miny, maxx, maxy = point.buffer(distance).bounds
    return box(minx, miny, maxx, maxy)

# Apply the function to create a 300m x 300m bounding box
distance = 150  # 150 meters in each direction to create a 300m x 300m box
lbb['bbox'] = lbb.geometry.apply(lambda point: create_bbox(point, distance))

# Create a new GeoDataFrame with the bounding boxes
bbox_gdf = gpd.GeoDataFrame(geometry=lbb['bbox'], crs=lbb.crs)

# Filter buildings and roads to the bounding box
buildings_in_bbox = gpd.overlay(buildings, bbox_gdf)
buildings_in_bbox_buff = buildings_in_bbox.buffer(20)
buildings_in_bbox_buff = gpd.GeoDataFrame(buildings_in_bbox_buff)
buildings_in_bbox_buff.rename(columns={0:'geometry'}, inplace=True)
buildings_in_bbox_buff.set_geometry('geometry', inplace=True)
buildings_in_bbox_buff = buildings_in_bbox_buff.dissolve()
roads_in_bbox = gpd.overlay(streets, bbox_gdf)
l_tess_in_bbox = gpd.overlay(l_tess, buildings_in_bbox_buff)
Inward offset...
Generating input point array...
Generating Voronoi diagram...
Generating GeoDataFrame...
Dissolving Voronoi polygons...
C:\Users\patri\AppData\Local\Temp\ipykernel_11772\2741401941.py:2: UserWarning: Tessellation does not fully match buildings. 10 element(s) collapsed during generation - unique_id: {25986, 48455, 26379, 26444, 51162, 48348, 48483, 26349, 48365, 2230}.
  l_tess = mp.Tessellation(buildings, 'uID', limit, segment=1)
C:\Users\patri\AppData\Local\Temp\ipykernel_11772\2741401941.py:2: UserWarning: Tessellation contains MultiPolygon elements. Initial objects should  be edited. `unique_id` of affected elements: [25863, 3296].
  l_tess = mp.Tessellation(buildings, 'uID', limit, segment=1)
c:\Users\patri\Documents\spopt_venv\lib\site-packages\geopandas\geodataframe.py:1528: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[38], line 24
     22 buildings_in_bbox_buff.set_geometry('geometry', inplace=True)
     23 buildings_in_bbox_buff = buildings_in_bbox_buff.dissolve()
---> 24 roads_in_bbox = gpd.overlay(lrds, bbox_gdf)
     25 l_tess_in_bbox = gpd.overlay(ltessellation, buildings_in_bbox_buff)

NameError: name 'lrds' is not defined
InΒ [44]:
# Plot the results
fig, ax = plt.subplots(2, 2, figsize=(10, 10))
#1 
roads_in_bbox.plot(ax=ax[0, 0], color='grey', linewidth=1)
#2
buildings_in_bbox.plot(ax=ax[0, 1], color='gray', alpha=0.5)
#3
buildings_in_bbox.plot(ax=ax[1, 0], color='gray', alpha=0.5)
roads_in_bbox.plot(ax=ax[1, 0], color='grey', linewidth=1)
#4
buildings_in_bbox.plot(ax=ax[1,  1], color='gray', alpha=0.5)
roads_in_bbox.plot(ax=ax[1, 1], color='grey', linewidth=1, alpha=0.7)
l_tess_in_bbox.plot(ax=ax[1,1], linewidth=1.2, edgecolor='blue', alpha=0.5)
buildings_in_bbox_buff.plot(ax=ax[1,1], alpha=0.1, color='gray', edgecolor='grey')

ax[0,0].set_title("A) Road segments", fontsize=12)
ax[0,1].set_title("B) Building polygons", fontsize=12)
ax[1,0].set_title("C) Roads and building polygons joined", fontsize=12)
ax[1,1].set_title("D) Tessellated units, bounded by a distance-band", fontsize=12)
ax[0,0].axis('off') 
ax[0,1].axis('off')  
ax[1,0].axis('off')  
ax[1,1].axis('off')  
Out[44]:
(458826.215508731, 459200.215508731, 304404.94278221956, 304778.94278221956)
No description has been provided for this image

The building and road morphology measures have been joined to the tessellated units, meaning that their combined spatial structure is captured in one, contiguous unit. Below, we see the same spatial distribution of building and road measures in this format.

InΒ [6]:
fig, axs = plt.subplots(2, 3, figsize=(24, 16))

# Define a function to style each subplot
def style_subplot(ax, title):
    ax.set_title(title, fontsize=16)
    ax.axis('off')  # This removes all axis lines, ticks, and labels

# Plot each subplot with reduced point size and custom styling
variables = [
    "closeness2000", "cds_len", "neighbor_distance",  'linearity',
    "width_deviation", "squarecompactness"
]

for i, var in enumerate(variables):
    row = i // 3
    col = i % 3
    if var:  # Only plot if there's a variable name
        merged.plot(var, ax=axs[row, col], scheme='natural_breaks', legend=True)
        style_subplot(axs[row, col], var.replace("_", " ").title())
    else:
        axs[row, col].axis('off')  # Turn off empty subplots

# Adjust layout to prevent overlap
plt.tight_layout()

# Show the plot
plt.show()
No description has been provided for this image

Drawing meaningful insights from the dataΒΆ

In their current format, there is a lot of spatial heterogeneity. Consequently, it could be challenging for a statistical model to generate reliable estimates, potentially impacted by outliers. For clustering algorithms, it may be challenging to detect patterns within the data. To address this, spatial weights can be generated which capture the relationship between individual units and their surrounding environment. Using these weights, a form of spatial smoothing is performed that retain the underlying structure whilst minimising extremes in variation. Otherwise known as the 'spatial lag'.

There are different weighting strategies, including the contiguous neighbours to each polygon (queen and rook), or distance-bands, referencing each neighbouring point within a specified distance threshold. We will use the distance-band approach.

500 metre distance bands are generated for each unit, which we then calculate the interquartile range of the values falling within that threshold. This diminishes the effect of outliers while retaining the overall tendency of the distribution. The function used to produce these weights and apply them to the geodataframe is shown below.

InΒ [Β ]:
from libpysal.weights import DistanceBand
om500 = DistanceBand.from_dataframe(merged, 500, geom_col='centroid', ids='xID')

def create_iqr_columns(df, columns, weight_matrix, id_col, rng=(25, 75)):
    
    columns_to_process = [col for col in df.columns if col != id_col]
    
    for column in columns_to_process:
        new_column_name = f"{column}_IQR_db500"
        print(f"Processing column: {column}")
        print(f"Weight matrix: {weight_matrix}")
        
        try:
            df[new_column_name] = mp.Range(df, column, weight_matrix, id_col, rng=rng).series
        except Exception as e:
            print(f"Error processing column {column}: {e}")
InΒ [Β ]:
IQR_db500_df = create_iqr_columns(merged, cluster_cols, om500, 'xID') # I have ommited the cluster columns originally included for brevity.
InΒ [11]:
cluster_data = pd.read_csv('C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/tessellation/cluster_data_distance_bands_100m_buffer_2.csv')

# GEOM
merged_all_geom = gpd.read_file('C:/Users/patri/Documents/CDRC Bikeshare Station Data/station_optimisation/data_pre_process/tessellation/merged_all_geom_100m_buffer_2.gpkg', driver='GPKG')
IQR_db500_df = pd.concat( [cluster_data,merged_all_geom[['city','geometry']]], axis=1)
IQR_db500_df = gpd.GeoDataFrame(IQR_db500_df, geometry='geometry')
InΒ [24]:
IQR_db500_df = pd.concat( [cluster_data,merged_all_geom[['geometry']]], axis=1)
IQR_db500_df = gpd.GeoDataFrame(IQR_db500_df, geometry='geometry')
IQR_db500_df = IQR_db500_df.loc[IQR_db500_df['city']=='Leicester']

Now, let's plot the same variables as before, but in their distance-band IQR format. We can see that it has generated clusters characteristing larger areas of the city, whilst corresponding closely to the general spatial patterns observed from the original raw values.

InΒ [26]:
fig, axs = plt.subplots(2, 3, figsize=(24, 16))

# Define a function to style each subplot
def style_subplot(ax, title):
    ax.set_title(title, fontsize=16)
    ax.axis('off')  # This removes all axis lines, ticks, and labels

# Plot each subplot with reduced point size and custom styling
variables = [
    "closeness2000_IQR_db500", "cds_len_IQR_db500", "neighbor_distance_IQR_db500",  'linearity_IQR_db500',
    "width_deviation_IQR_db500", "squarecompactness_IQR_db500"
]

for i, var in enumerate(variables):
    row = i // 3
    col = i % 3
    if var:  # Only plot if there's a variable name
        IQR_db500_df.plot(var, ax=axs[row, col], scheme='natural_breaks', legend=True)
        style_subplot(axs[row, col], var.replace("_", " ").title())
    else:
        axs[row, col].axis('off')  # Turn off empty subplots

# Adjust layout to prevent overlap
plt.tight_layout()

# Show the plot
plt.show()
No description has been provided for this image

Why have we done this?ΒΆ

These new, spatially smoothed, variables will serve as input for the cluster analysis. The aim is to combine several morphological variables to create clusters that describe the differing morphologies of each city. Additional built environment variables are also incorporated, including measures of accessibility to points of interest, housing type and population density. These steps are shown in the following post.